Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription

نویسندگان

Kishan Thambiratnam

Frank Seide

چکیده

This paper examines the application of lattice adaptation techniques to speaker-dependent models for the purpose of conversational telephone speech transcription. Given sufficient training data per speaker, it is feasible to build adapted speakerdependent models using lattice MLLR and lattice MAP. Experiments on iterative and cascaded adaptation are presented. Additionally various strategies for thresholding frame posteriors are investigated, and it is shown that accumulating statistics from the local best-confidence path is sufficient to achieve optimal adaptation. Overall, an iterative cascaded lattice system was able to reduce WER by 7.0% abs., which was a 0.8% abs. gain over transcript-based adaptation. Lattice adaptation reduced the unsupervised/supervised adaptation gap from 2.5% to 1.7%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conversational telephone speech recognition

This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the aco...

متن کامل

Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models

Hidden Markov model (HMM) -based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervi...

متن کامل

Automatically learning speaker-independent acoustic subword units

We investigate methods for unsupervised learning of sub-word acoustic units of a language directly from speech. We demonstrate that states of a hidden Markov model “grown” using a novel modification of the maximum likelihood successive state splitting algorithm correspond very well with the phones of the language. In particular, the correspondence between the Viterbi state sequence for unseen s...

متن کامل

Unsupervised speaker indexing using anchor models and automatic transcription of discussions

We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database. Several techniques are introduced to improve discriminant ability. ASR is performed using the results ...

متن کامل

A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR

This paper presents an experimental study of a maximum likelihood (ML) approach to irrelevant variability normalization (IVN) based training and unsupervised online adaptation for large vocabulary continuous speech recognition. A movingwindow based frame labeling method is used for acoustic sniffing. The IVN-based approach achieves a 10% relative word error rate reduction over an ML-trained bas...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription

نویسندگان

چکیده

منابع مشابه

Conversational telephone speech recognition

Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models

Automatically learning speaker-independent acoustic subword units

Unsupervised speaker indexing using anchor models and automatic transcription of discussions

A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR

عنوان ژورنال:

اشتراک گذاری